Goto

Collaborating Authors

 learning deep feature


Learning Deep Features for Scene Recognition using Places Database

Neural Information Processing Systems

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers' responses allows us to show differences in the internal representations of object-centric and scene-centric networks.


Review for NeurIPS paper: Self-supervised learning through the eyes of a child

Neural Information Processing Systems

Weaknesses: - I expected to see the linear evaluation performance on ImageNet can be impressive. However, it's a pity to see this transfer learning's performance is poor with only TC-S: 20.9% at best. This seriously limits the impact of this work. If the model can only perform well on some easy datasets that are close to the SAYCam, we cannot get too much benefits from learning on such datasets, especially with access to so many big datasets. Maybe the authors can change the SAYCam to other standard videos (Charades) and see if they can have good transfer learning performances.


Learning Deep Features for Scene Recognition using Places Database

Neural Information Processing Systems

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity.